
Basic spatial operations in Geo Dataframes¶
We will review some important operations for geodataframes. As usual, let's do this:
- Create a repository named: geodf_operations.
- Clone that repo to a local folder in your computer.
- In that local folder in your computer, create a folder named maps and data.
- Download the shapefile of "Brazil - Subnational Administrative Boundaries"(bra_adm_ibge_2020_SHP.zip) from here and save it in the maps folder (you need to unzip the file).
- Download a CSV file with information on the airports in Brazil from this website, Save it in my data folder:
- Commit and push.
Let's remember the contents of the world map from last session:
linkWorldMap="https://github.com/CienciaDeDatosEspacial/intro_geodataframe/raw/main/maps/worldMaps.gpkg"
import geopandas as gpd
from fiona import listlayers
listlayers(linkWorldMap)
['countries', 'rivers', 'cities', 'indicators']
Let's open all the layers
countries=gpd.read_file(linkWorldMap,layer='countries')
rivers=gpd.read_file(linkWorldMap,layer='rivers')
cities=gpd.read_file(linkWorldMap,layer='cities')
indicators=gpd.read_file(linkWorldMap,layer='indicators')
Now, let's see some important spatial operations:
countries.head()
| COUNTRY | geometry | |
|---|---|---|
| 0 | Aruba (Netherlands) | POLYGON ((-69.88223 12.41111, -69.94695 12.436... |
| 1 | Antigua and Barbuda | MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ... |
| 2 | Afghanistan | POLYGON ((61.27656 35.60725, 61.29638 35.62853... |
| 3 | Algeria | POLYGON ((-5.15213 30.18047, -5.13917 30.19236... |
| 4 | Azerbaijan | MULTIPOLYGON (((46.54037 38.87559, 46.49554 38... |
# as DF
countries.iloc[50:,]
| COUNTRY | geometry | |
|---|---|---|
| 50 | Central African Republic | POLYGON ((20.45330 4.52379, 20.45798 4.61931, ... |
| 51 | Cuba | MULTIPOLYGON (((-76.94608 21.45221, -76.88390 ... |
| 52 | Cape Verde | MULTIPOLYGON (((-25.28139 16.91333, -25.29861 ... |
| 53 | Cook Islands (New Zealand) | MULTIPOLYGON (((-165.84167 -10.89084, -165.848... |
| 54 | Cyprus | POLYGON ((33.27229 34.70955, 33.21722 34.69944... |
| ... | ... | ... |
| 247 | South Sudan | POLYGON ((34.21807 9.96458, 34.20722 9.90500, ... |
| 248 | Indonesia | MULTIPOLYGON (((123.21846 -10.80917, 123.19832... |
| 249 | East Timor | MULTIPOLYGON (((124.41824 -9.30010, 124.40446 ... |
| 250 | Curacao (Netherlands) | POLYGON ((-68.96556 12.19889, -68.91196 12.181... |
| 251 | Bonaire (Netherlands) | POLYGON ((-68.19736 12.22264, -68.19292 12.207... |
202 rows × 2 columns
# as DF
countries.loc[50:,'geometry']
50 POLYGON ((20.45330 4.52379, 20.45798 4.61931, ...
51 MULTIPOLYGON (((-76.94608 21.45221, -76.88390 ...
52 MULTIPOLYGON (((-25.28139 16.91333, -25.29861 ...
53 MULTIPOLYGON (((-165.84167 -10.89084, -165.848...
54 POLYGON ((33.27229 34.70955, 33.21722 34.69944...
...
247 POLYGON ((34.21807 9.96458, 34.20722 9.90500, ...
248 MULTIPOLYGON (((123.21846 -10.80917, 123.19832...
249 MULTIPOLYGON (((124.41824 -9.30010, 124.40446 ...
250 POLYGON ((-68.96556 12.19889, -68.91196 12.181...
251 POLYGON ((-68.19736 12.22264, -68.19292 12.207...
Name: geometry, Length: 202, dtype: geometry
But as a GeDF, you can also filter using a coordinate point via cx. Let me get the bounding box of the map:
countries.total_bounds
array([-180. , -90. , 180. , 83.62359619])
As you are getting [minx, miny, maxx, maxy] let me select a valid coordinate, i.e. (0,0)
countries.cx[:0,:0]
| COUNTRY | geometry | |
|---|---|---|
| 9 | American Samoa (US) | POLYGON ((-170.74390 -14.37556, -170.74942 -14... |
| 10 | Argentina | MULTIPOLYGON (((-71.01648 -36.47591, -70.98195... |
| 14 | Antarctica | MULTIPOLYGON (((-45.14528 -60.76611, -45.15639... |
| 24 | Bolivia | POLYGON ((-62.19884 -20.47139, -62.26945 -20.5... |
| 29 | Brazil | MULTIPOLYGON (((-70.62862 -9.94849, -70.62889 ... |
| 42 | Chile | MULTIPOLYGON (((-73.61806 -51.63390, -73.60494... |
| 47 | Colombia | MULTIPOLYGON (((-81.71306 12.49028, -81.72014 ... |
| 53 | Cook Islands (New Zealand) | MULTIPOLYGON (((-165.84167 -10.89084, -165.848... |
| 58 | Jarvis Island (US) | POLYGON ((-160.02115 -0.39806, -160.02811 -0.3... |
| 60 | Ecuador | MULTIPOLYGON (((-78.70903 -4.58479, -78.72348 ... |
| 71 | Fiji | MULTIPOLYGON (((180.00000 -16.17274, 179.98621... |
| 72 | Falkland Islands (UK) | MULTIPOLYGON (((-59.79139 -51.24945, -59.73195... |
| 75 | French Polynesia (France) | MULTIPOLYGON (((-140.17783 -8.95639, -140.1894... |
| 158 | Niue (New Zealand) | POLYGON ((-169.89389 -19.14556, -169.93088 -19... |
| 169 | New Zealand | MULTIPOLYGON (((177.91779 -38.94280, 177.90970... |
| 170 | Paraguay | POLYGON ((-57.67267 -25.29430, -57.70639 -25.2... |
| 171 | Pitcairn Islands (UK) | MULTIPOLYGON (((-128.33221 -24.32727, -128.326... |
| 172 | Peru | POLYGON ((-69.56750 -10.95056, -69.56844 -10.9... |
| 196 | St. Helena (UK) | POLYGON ((-5.71298 -15.99286, -5.72917 -16.005... |
| 208 | South Georgia and the South Sandwich Is (UK) | MULTIPOLYGON (((-36.99139 -54.35056, -36.99973... |
| 216 | Tokelau (New Zealand) | POLYGON ((-171.84805 -9.21889, -171.85886 -9.2... |
| 217 | Tonga | MULTIPOLYGON (((-173.93921 -18.56889, -173.933... |
| 231 | Uruguay | POLYGON ((-58.38889 -33.42250, -58.41590 -33.4... |
| 239 | Wallis and Futuna (France) | MULTIPOLYGON (((-176.16504 -13.35305, -176.169... |
| 242 | Western Samoa | MULTIPOLYGON (((-172.59650 -13.50911, -172.551... |
#then
countries.cx[:0,:0].plot()
<Axes: >
Notice cx would be cleaner if spatial element is a point.
Let me keep one country:
brazil=countries[countries.COUNTRY=='Brazil']
Clipping¶
But you can also subset by clipping, as sometimes other data frames may not have the same fields for filtering:
citiesBrazil_clipped = gpd.clip(gdf=cities,
mask=brazil)
riversBrazil_clipped = gpd.clip(gdf=rivers,
mask=brazil)
Then, you can plot the clipped version:
base = brazil.plot(facecolor="greenyellow", edgecolor='black', linewidth=0.4,figsize=(5,5))
citiesBrazil_clipped.plot(marker='+', color='red', markersize=15,
ax=base)
riversBrazil_clipped.plot(edgecolor='blue', linewidth=0.5,
ax=base)
<Axes: >
The geometry types are not modified:
set(brazil.geom_type), set(citiesBrazil_clipped.geom_type), set(riversBrazil_clipped.geom_type)
({'MultiPolygon'}, {'Point'}, {'LineString', 'MultiLineString'})
Exercise 1¶
- Follow the same steps in this last section to plot three maps of one country. Do not use Brazil.
- Plot your three layers.
brazil.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
# check units of measurement
brazil.crs.axis_info
[Axis(name=Geodetic latitude, abbrev=Lat, direction=north, unit_auth_code=EPSG, unit_code=9122, unit_name=degree), Axis(name=Geodetic longitude, abbrev=Lon, direction=east, unit_auth_code=EPSG, unit_code=9122, unit_name=degree)]
# is this CRS projected?
brazil.crs.is_projected
False
Polygons have a centroid. When we try getting a centroid from an unprojected polygon, you get:
# centroid
brazil.centroid
<ipython-input-16-7f66e1ea2110>:2: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. brazil.centroid
29 POINT (-53.09009 -10.77302) dtype: geometry
# recommended for Brazil (meters)
brazil.to_crs(5641).crs.axis_info
[Axis(name=Easting, abbrev=X, direction=east, unit_auth_code=EPSG, unit_code=9001, unit_name=metre), Axis(name=Northing, abbrev=Y, direction=north, unit_auth_code=EPSG, unit_code=9001, unit_name=metre)]
# now this works with no warning
brazil.to_crs(5641).centroid
29 POINT (3884486.179 8756856.093) dtype: geometry
# replotting:
base5641=brazil.to_crs(5641).plot()
brazil.to_crs(5641).centroid.plot(color='red',ax=base5641)
<Axes: >
Let's keep the projected version for all our maps:
brazil_5641=brazil.to_crs(5641)
cities_brazil_5641=citiesBrazil_clipped.to_crs(brazil_5641.crs)
rivers_brazil_5641=riversBrazil_clipped.to_crs(brazil_5641.crs)
from google.colab import drive
drive.mount('/content/drive/')
Mounted at /content/drive/
## saving
import os
maps = '/content/drive/MyDrive/Colab Notebooks/maps'
brazil_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='country', driver="GPKG")
cities_brazil_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='cities', driver="GPKG")
rivers_brazil_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='rivers', driver="GPKG")
brazil_5641.centroid.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='centroid', driver="GPKG")
Exercise 2¶
- Reproject your country's map layers.
- Plot the reprojected layers
- Save the reprojected layers
Creating Spatial data¶
You will get Lines and Polygons as maps for sure, but that may not be the case with points. Let's open the CSV with information on the airports in Brazil:
import pandas as pd
data = '/content/drive/MyDrive/Colab Notebooks/data'
infoairports=pd.read_csv(os.path.join(data,"br-airports.csv"))
# some rows
infoairports.iloc[[0,1,2,3,-4,-3,-2,-1],:] #head and tail
| id | ident | type | name | latitude_deg | longitude_deg | elevation_ft | continent | country_name | iso_country | ... | municipality | scheduled_service | gps_code | iata_code | local_code | home_link | wikipedia_link | keywords | score | last_updated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | #meta +id | #meta +code | #loc +airport +type | #loc +airport +name | #geo +lat | #geo +lon | #geo +elevation +ft | #region +continent +code | #country +name | #country +code +iso2 | ... | #loc +municipality +name | #status +scheduled | #loc +airport +code +gps | #loc +airport +code +iata | #loc +airport +code +local | #meta +url +airport | #meta +url +wikipedia | #meta +keywords | #meta +score | #date +updated |
| 1 | 5910 | SBGR | large_airport | Guarulhos - Governador André Franco Montoro In... | -23.431944 | -46.467778 | 2461 | SA | Brazil | BR | ... | São Paulo | 1 | SBGR | GRU | SP0002 | http://www.aeroportoguarulhos.net/ | https://en.wikipedia.org/wiki/S%C3%A3o_Paulo-G... | Cumbica | 1016675 | 2021-10-28T15:52:55+00:00 |
| 2 | 5906 | SBGL | large_airport | Rio Galeão – Tom Jobim International Airport | -22.809999 | -43.250557 | 28 | SA | Brazil | BR | ... | Rio De Janeiro | 1 | SBGL | GIG | RJ0001 | https://www.riogaleao.com/passageiros | https://en.wikipedia.org/wiki/Rio_de_Janeiro-G... | Galeão - Antônio Carlos Jobim International Ai... | 51475 | 2024-05-09T15:04:08+00:00 |
| 3 | 5974 | SBSP | medium_airport | Congonhas Airport | -23.627657 | -46.654601 | 2631 | SA | Brazil | BR | ... | São Paulo | 1 | SBSP | CGH | SP0001 | http://www.infraero.gov.br/usa/aero_prev_home.... | https://en.wikipedia.org/wiki/Congonhas-S%C3%A... | http://www.infraero.gov.br/usa/aero_prev_home.... | 750 | 2022-05-03T20:10:35+00:00 |
| 7082 | 309669 | SSVR | closed | Volta Redonda Airport | -22.4978 | -44.085 | 1245 | SA | Brazil | BR | ... | Volta Redonda | 0 | NaN | NaN | NaN | NaN | NaN | SSVR, SSVR, QVR | 0 | 2013-09-28T14:52:12+00:00 |
| 7083 | 341727 | BR-1429 | heliport | Santa Helena Heliport | -23.59851 | -47.441196 | 2254 | SA | Brazil | BR | ... | Votorantim | 0 | SWHE | NaN | SP0807 | NaN | NaN | NaN | 0 | 2021-03-07T10:30:07+00:00 |
| 7084 | 343017 | BR-1493 | heliport | Bandeiras Centro Empresarial Heliport | -23.536615 | -47.449475 | 1827 | SA | Brazil | BR | ... | Votorantim | 0 | SWST | NaN | SP1306 | NaN | NaN | NaN | 0 | 2021-04-14T20:12:01+00:00 |
| 7085 | 509863 | SN3D | heliport | Alphaville Nova Esplanada 2 Heliport | -23.558971 | -47.473779 | 2083 | SA | Brazil | BR | ... | Votorantim | 0 | SN3D | NaN | SP1453 | NaN | NaN | NaN | 0 | 2023-06-30T14:01:13+00:00 |
8 rows × 23 columns
This needs some cleaning:
# bye first row
infoairports.drop(index=0,inplace=True)
infoairports.reset_index(drop=True, inplace=True)
infoairports.head()
| id | ident | type | name | latitude_deg | longitude_deg | elevation_ft | continent | country_name | iso_country | ... | municipality | scheduled_service | gps_code | iata_code | local_code | home_link | wikipedia_link | keywords | score | last_updated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5910 | SBGR | large_airport | Guarulhos - Governador André Franco Montoro In... | -23.431944 | -46.467778 | 2461 | SA | Brazil | BR | ... | São Paulo | 1 | SBGR | GRU | SP0002 | http://www.aeroportoguarulhos.net/ | https://en.wikipedia.org/wiki/S%C3%A3o_Paulo-G... | Cumbica | 1016675 | 2021-10-28T15:52:55+00:00 |
| 1 | 5906 | SBGL | large_airport | Rio Galeão – Tom Jobim International Airport | -22.809999 | -43.250557 | 28 | SA | Brazil | BR | ... | Rio De Janeiro | 1 | SBGL | GIG | RJ0001 | https://www.riogaleao.com/passageiros | https://en.wikipedia.org/wiki/Rio_de_Janeiro-G... | Galeão - Antônio Carlos Jobim International Ai... | 51475 | 2024-05-09T15:04:08+00:00 |
| 2 | 5974 | SBSP | medium_airport | Congonhas Airport | -23.627657 | -46.654601 | 2631 | SA | Brazil | BR | ... | São Paulo | 1 | SBSP | CGH | SP0001 | http://www.infraero.gov.br/usa/aero_prev_home.... | https://en.wikipedia.org/wiki/Congonhas-S%C3%A... | http://www.infraero.gov.br/usa/aero_prev_home.... | 750 | 2022-05-03T20:10:35+00:00 |
| 3 | 5967 | SBRJ | medium_airport | Santos Dumont Airport | -22.9105 | -43.163101 | 11 | SA | Brazil | BR | ... | Rio de Janeiro | 1 | SBRJ | SDU | RJ0002 | https://www4.infraero.gov.br/aeroportos/aeropo... | https://en.wikipedia.org/wiki/Santos_Dumont_Ai... | RIO | 750 | 2022-03-28T23:27:00+00:00 |
| 4 | 5872 | SBBR | large_airport | Presidente Juscelino Kubitschek International ... | -15.869167 | -47.920834 | 3497 | SA | Brazil | BR | ... | Brasília | 1 | SBBR | BSB | DF0001 | http://www.infraero.gov.br/usa/aero_prev_home.... | https://en.wikipedia.org/wiki/Bras%C3%ADlia_In... | NaN | 51275 | 2020-08-24T11:15:12+00:00 |
5 rows × 23 columns
# keep the columns needed
infoairports.columns
Index(['id', 'ident', 'type', 'name', 'latitude_deg', 'longitude_deg',
'elevation_ft', 'continent', 'country_name', 'iso_country',
'region_name', 'iso_region', 'local_region', 'municipality',
'scheduled_service', 'gps_code', 'iata_code', 'local_code', 'home_link',
'wikipedia_link', 'keywords', 'score', 'last_updated'],
dtype='object')
keep=['name','type','latitude_deg', 'longitude_deg','elevation_ft','region_name','municipality']
infoairports=infoairports.loc[:,keep]
infoairports.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7085 entries, 0 to 7084 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 7085 non-null object 1 type 7085 non-null object 2 latitude_deg 7085 non-null object 3 longitude_deg 7085 non-null object 4 elevation_ft 6915 non-null object 5 region_name 7085 non-null object 6 municipality 7060 non-null object dtypes: object(7) memory usage: 387.6+ KB
Some formatting:
numericCols=['latitude_deg', 'longitude_deg','elevation_ft']
infoairports[numericCols]=infoairports.loc[:,numericCols].apply(lambda x:pd.to_numeric(x))
# now
infoairports.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7085 entries, 0 to 7084 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 7085 non-null object 1 type 7085 non-null object 2 latitude_deg 7085 non-null float64 3 longitude_deg 7085 non-null float64 4 elevation_ft 6915 non-null float64 5 region_name 7085 non-null object 6 municipality 7060 non-null object dtypes: float64(3), object(4) memory usage: 387.6+ KB
# let's plot
base = brazil_5641.plot(color='white', edgecolor='black') #unprojected
infoairports.plot.scatter(x = 'longitude_deg', y = 'latitude_deg',ax=base)
<Axes: xlabel='longitude_deg', ylabel='latitude_deg'>
Why is it wrong?
airports=gpd.GeoDataFrame(data=infoairports.copy(),
geometry=gpd.points_from_xy(infoairports.longitude_deg,
infoairports.latitude_deg),
crs=brazil.crs.to_epsg())# the coordinates were in degrees - unprojected
# let's plot
base = brazil_5641.plot(color='white', edgecolor='black')
airports.plot(ax=base)
<Axes: >
#remember:
type(airports), type(infoairports)
(geopandas.geodataframe.GeoDataFrame, pandas.core.frame.DataFrame)
Let's re project!
airports_5641=airports.to_crs(5641)
## then
base = brazil_5641.plot(color='white', edgecolor='black')
airports_5641.plot(ax=base)
<Axes: >
Remember you have type of airports:
airports_5641['type'].value_counts() # this will not work: airports.type.value_counts()
type small_airport 4880 heliport 1787 closed 282 medium_airport 124 large_airport 11 seaplane_base 1 Name: count, dtype: int64
We may use that in the future. For now, just rename the type column to a different one.
airports_5641.rename(columns={'type':'kind'},inplace=True)
## adding the airports to GPKG
airports_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='airports', driver="GPKG")
Formating Geoseries projections¶
You know brazil_5641 is a multipolygon:
brazil_5641
| COUNTRY | geometry | |
|---|---|---|
| 29 | Brazil | MULTIPOLYGON (((1926257.542 8894978.397, 19262... |
Sometime, you just need the border (lines):
brazil_5641.boundary
29 MULTILINESTRING ((1926257.542 8894978.397, 192... dtype: geometry
# This is just the borderline
brazil_5641.boundary.plot()
<Axes: >
Always check the data type:
# does 'boundary' return a GDF?
type(brazil_5641.boundary)
geopandas.geoseries.GeoSeries
def __init__(data=None, index=None, crs=None, **kwargs)
A Series object designed to store shapely geometry objects. Parameters ---------- data : array-like, dict, scalar value The geometries to store in the GeoSeries. index : array-like or Index The index for the GeoSeries. crs : value (optional) Coordinate Reference System of the geometry objects. Can be anything accepted by :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`, such as an authority string (eg "EPSG:4326") or a WKT string. kwargs Additional arguments passed to the Series constructor, e.g. ``name``. Examples -------- >>> from shapely.geometry import Point >>> s = geopandas.GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)]) >>> s 0 POINT (1.00000 1.00000) 1 POINT (2.00000 2.00000) 2 POINT (3.00000 3.00000) dtype: geometry >>> s = geopandas.GeoSeries( ... [Point(1, 1), Point(2, 2), Point(3, 3)], crs="EPSG:3857" ... ) >>> s.crs # doctest: +SKIP <Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World - 85°S to 85°N - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich >>> s = geopandas.GeoSeries( ... [Point(1, 1), Point(2, 2), Point(3, 3)], index=["a", "b", "c"], crs=4326 ... ) >>> s a POINT (1.00000 1.00000) b POINT (2.00000 2.00000) c POINT (3.00000 3.00000) dtype: geometry >>> s.crs <Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich See Also -------- GeoDataFrame pandas.Series
Some operations in geopandas require GDF or GS. If you need a GDF instead of a GS:
# converting into GDF
brazil_5641.boundary.to_frame()
| 0 | |
|---|---|
| 29 | MULTILINESTRING ((1926257.542 8894978.397, 192... |
Notice you get a very simple GDF, and you may want to add some information:
# conversion
brazil_border=brazil_5641.boundary.to_frame()
# new column (optional)
brazil_border['name']='Brazil'
# renaming the geometry column
brazil_border.rename(columns={0:'geometry'},inplace=True)
#setting the geometry (the name is not enough)
brazil_border = brazil_border.set_geometry("geometry")
# verifying:
brazil_border.crs
<Projected CRS: EPSG:5641> Name: SIRGAS 2000 / Brazil Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: Brazil - offshore - equatorial margin. - bounds: (-51.64, -5.74, -32.43, 7.04) Coordinate Operation: - name: Petrobras Mercator - method: Mercator (variant B) Datum: Sistema de Referencia Geocentrico para las AmericaS 2000 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
You can add this GDF as a layer.
Exercise 4¶
Check if your country is a polygon or multipolygon.
Recover just the boundaries of that country.
Turn the boundary into a GDF.
Maps Lacking CRS information¶
Reprojecting seems a simple process, but you might find some interesting cases. Let's read the maps on states and municipalities:
brazil_states=gpd.read_file(os.path.join(maps,"bra_adm_ibge_2020_shp","bra_admbnda_adm1_ibge_2020.shp"))
brazil_municipalities=gpd.read_file(os.path.join(maps,"bra_adm_ibge_2020_shp","bra_admbnda_adm2_ibge_2020.shp"))
They are maps, for sure:
type(brazil_states), type(brazil_municipalities)
(geopandas.geodataframe.GeoDataFrame, geopandas.geodataframe.GeoDataFrame)
brazil_states.geometry.head()
0 MULTIPOLYGON (((-68.87747 -11.01987, -68.88027... 1 POLYGON ((-35.46317 -8.82467, -35.46457 -8.828... 2 MULTIPOLYGON (((-50.46147 2.11133, -50.45627 2... 3 MULTIPOLYGON (((-58.49367 -0.84197, -58.48917 ... 4 MULTIPOLYGON (((-38.70687 -17.96447, -38.70867... Name: geometry, dtype: geometry
brazil_municipalities.geometry.head()
0 POLYGON ((-62.04950 -11.86731, -62.04559 -11.8... 1 POLYGON ((-62.42279 -9.80481, -62.42688 -9.806... 2 POLYGON ((-60.37329 -13.40887, -60.37323 -13.4... 3 POLYGON ((-61.00061 -10.99219, -61.00061 -11.0... 4 POLYGON ((-61.20642 -13.08759, -61.20282 -13.0... Name: geometry, dtype: geometry
But, notice this:
brazil_states.crs, brazil_municipalities.crs
(None, None)
They do not have crs information, however they can be plotted:
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=False, sharey=False, figsize=(12,12))
brazil_states.plot(ax=ax1, facecolor='lightgrey', edgecolor='black')
brazil_municipalities.plot(ax=ax2, facecolor='lightgrey', edgecolor='black',linewidth=0.2)
<Axes: >
Since we are using the crs 5641 for Brazil, the initial strategy could be to set the CRS with the right projection :
brazil_states.to_crs(5641)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-54-f2da4e029208> in <cell line: 1>() ----> 1 brazil_states.to_crs(5641) /usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py in to_crs(self, crs, epsg, inplace) 1422 else: 1423 df = self.copy() -> 1424 geom = df.geometry.to_crs(crs=crs, epsg=epsg) 1425 df.geometry = geom 1426 if not inplace: /usr/local/lib/python3.10/dist-packages/geopandas/geoseries.py in to_crs(self, crs, epsg) 1132 """ 1133 return GeoSeries( -> 1134 self.values.to_crs(crs=crs, epsg=epsg), index=self.index, name=self.name 1135 ) 1136 /usr/local/lib/python3.10/dist-packages/geopandas/array.py in to_crs(self, crs, epsg) 790 """ 791 if self.crs is None: --> 792 raise ValueError( 793 "Cannot transform naive geometries. " 794 "Please set a crs on the object first." ValueError: Cannot transform naive geometries. Please set a crs on the object first.
Python says "Please set a crs on the object first". This would mean to know the actual projection, of the geometry:
From the plots above and the rows seen, we conclude the maps are unprojected; then:
# set as unprojected
brazil_states.crs = "EPSG:4326"
brazil_municipalities.crs = "EPSG:4326"
Now, we can reproject:
brazil_states=brazil_states.to_crs(5641)
brazil_municipalities=brazil_municipalities.to_crs(5641)
Exercise 5 (conditional)¶
- Look for sub administrative divisions of your country
# https://data.humdata.org/dataset/whosonfirst-data-admin-dnk
denmark_states=gpd.read_file(os.path.join(maps,"whosonfirst-data-admin-dk-latest","whosonfirst-data-admin-dk-region-polygon.shp"))
denmark_municipalities=gpd.read_file(os.path.join(maps,"whosonfirst-data-admin-dk-latest","whosonfirst-data-admin-dk-localadmin-polygon.shp"))
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=False, sharey=False, figsize=(12,12))
denmark_states.cx[0:,:].plot(ax=ax1, facecolor='lightgrey', edgecolor='black')
denmark_municipalities.plot(ax=ax2, facecolor='lightgrey', edgecolor='black',linewidth=0.2)
<Axes: >
- Check all the CRSs of those divisions
print(type(denmark_states), type(denmark_municipalities)) #vemos que es un gdf
print(denmark_states.geometry.head()) # es multipoligonal
print(denmark_states.crs, denmark_municipalities.crs) # sí tiene crs
<class 'geopandas.geodataframe.GeoDataFrame'> <class 'geopandas.geodataframe.GeoDataFrame'> 0 MULTIPOLYGON (((-6.91145 61.62627, -6.89056 61... 1 MULTIPOLYGON (((8.40776 55.45601, 8.40627 55.4... 2 MULTIPOLYGON (((14.70088 55.14588, 14.69875 55... 3 MULTIPOLYGON (((10.98130 54.83016, 10.97817 54... 4 MULTIPOLYGON (((11.02050 57.24190, 11.06478 57... Name: geometry, dtype: geometry EPSG:4326 EPSG:4326
print(denmark_states.crs.axis_info, '\n', denmark_municipalities.crs.axis_info) # ESTÁ EN POLARES, DEBEMOS TRANSFORMAR LUEGO A METROS
print(denmark_states.crs, denmark_municipalities.crs) # sí tiene crs
[Axis(name=Geodetic latitude, abbrev=Lat, direction=north, unit_auth_code=EPSG, unit_code=9122, unit_name=degree), Axis(name=Geodetic longitude, abbrev=Lon, direction=east, unit_auth_code=EPSG, unit_code=9122, unit_name=degree)] [Axis(name=Geodetic latitude, abbrev=Lat, direction=north, unit_auth_code=EPSG, unit_code=9122, unit_name=degree), Axis(name=Geodetic longitude, abbrev=Lon, direction=east, unit_auth_code=EPSG, unit_code=9122, unit_name=degree)] EPSG:4326 EPSG:4326
denmark = countries[countries.COUNTRY=='Dinamarca']
denmark.crs #4326
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
- If you find one CRS is missing, fill the CRS with the right projection.
# ya tienen su CRS, lo que nos falta es hacer la correcta proyección a un crs favorable
# tomamos "FEH2010 / Fehmarnbelt TM" con "EPSG:5596"
denmark_states = denmark_states.to_crs(5596)
denmark_municipalities = denmark_municipalities.to_crs(5596)
brazil_municipalities.plot(facecolor='lightgrey', edgecolor='black',linewidth=0.2)
<Axes: >
#see
brazil_municipalities.head()
| ADM0_EN | ADM0_PT | ADM0_PCODE | ADM1_PT | ADM1_PCODE | ADM2_PT | ADM2_PCODE | ET_ID | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Brazil | Brasil | BR | Rondônia | BR11 | Alta Floresta D'Oeste | BR1100015 | 0 | POLYGON ((2880702.575 8678970.180, 2881137.154... |
| 1 | Brazil | Brasil | BR | Rondônia | BR11 | Ariquemes | BR1100023 | 1 | POLYGON ((2839173.154 8911097.984, 2838718.204... |
| 2 | Brazil | Brasil | BR | Rondônia | BR11 | Cabixi | BR1100031 | 2 | POLYGON ((3067184.343 8504324.153, 3067191.133... |
| 3 | Brazil | Brasil | BR | Rondônia | BR11 | Cacoal | BR1100049 | 3 | POLYGON ((2997393.730 8777661.276, 2997393.730... |
| 4 | Brazil | Brasil | BR | Rondônia | BR11 | Cerejeiras | BR1100056 | 4 | POLYGON ((2974496.868 8540812.592, 2974897.495... |
# higher level
brazil_municipalities.ADM1_PT.value_counts()
ADM1_PT Minas Gerais 853 São Paulo 645 Rio Grande do Sul 499 Bahia 417 Paraná 399 Santa Catarina 295 Goiás 246 Piauí 224 Paraíba 223 Maranhão 217 Pernambuco 185 Ceará 184 Rio Grande do Norte 167 Pará 144 Mato Grosso 141 Tocantins 139 Alagoas 102 Rio de Janeiro 92 Mato Grosso do Sul 79 Espírito Santo 78 Sergipe 75 Amazonas 62 Rondônia 52 Acre 22 Amapá 16 Roraima 15 Distrito Federal 1 Name: count, dtype: int64
Then, this is Minas Gerais:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].plot()
<Axes: >
Unary UNION¶
We can combine all these polygons into one:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].unary_union
Let's save that result:
MinasGerais_union=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].unary_union
# what do we have?
type(MinasGerais_union)
shapely.geometry.multipolygon.MultiPolygon
You can turn that shapely object into a GeoDF like this:
gpd.GeoDataFrame(index=[0],data={'ADM':'Minas Gerais'},
crs=brazil_municipalities.crs,
geometry=[MinasGerais_union])
| ADM | geometry | |
|---|---|---|
| 0 | Minas Gerais | MULTIPOLYGON (((4613768.235 7444013.818, 46132... |
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve().plot()
<Axes: >
Let me save the result, and see the type :
MinasGerais_dissolve=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve()
# we got?
type(MinasGerais_dissolve)
geopandas.geodataframe.GeoDataFrame
def __init__(data=None, *args, geometry=None, crs=None, **kwargs)
A GeoDataFrame object is a pandas.DataFrame that has a column with geometry. In addition to the standard DataFrame constructor arguments, GeoDataFrame also accepts the following keyword arguments: Parameters ---------- crs : value (optional) Coordinate Reference System of the geometry objects. Can be anything accepted by :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`, such as an authority string (eg "EPSG:4326") or a WKT string. geometry : str or array (optional) If str, column to use as geometry. If array, will be set as 'geometry' column on GeoDataFrame. Examples -------- Constructing GeoDataFrame from a dictionary. >>> from shapely.geometry import Point >>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf col1 geometry 0 name1 POINT (1.00000 2.00000) 1 name2 POINT (2.00000 1.00000) Notice that the inferred dtype of 'geometry' columns is geometry. >>> gdf.dtypes col1 object geometry geometry dtype: object Constructing GeoDataFrame from a pandas DataFrame with a column of WKT geometries: >>> import pandas as pd >>> d = {'col1': ['name1', 'name2'], 'wkt': ['POINT (1 2)', 'POINT (2 1)']} >>> df = pd.DataFrame(d) >>> gs = geopandas.GeoSeries.from_wkt(df['wkt']) >>> gdf = geopandas.GeoDataFrame(df, geometry=gs, crs="EPSG:4326") >>> gdf col1 wkt geometry 0 name1 POINT (1 2) POINT (1.00000 2.00000) 1 name2 POINT (2 1) POINT (2.00000 1.00000) See also -------- GeoSeries : Series object designed to store shapely geometry objects
You got a GEOdf this time:
## see
MinasGerais_dissolve
| geometry | ADM0_EN | ADM0_PT | ADM0_PCODE | ADM1_PT | ADM1_PCODE | ADM2_PT | ADM2_PCODE | ET_ID | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | MULTIPOLYGON (((4613768.235 7444013.818, 46132... | Brazil | Brasil | BR | Minas Gerais | BR31 | Abadia dos Dourados | BR3100104 | 2244 |
# keeping what is relevant
MinasGerais_dissolve.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)
# then
MinasGerais_dissolve
| geometry | ADM0_EN | ADM0_PT | ADM0_PCODE | ADM1_PT | ADM1_PCODE | |
|---|---|---|---|---|---|---|
| 0 | MULTIPOLYGON (((4613768.235 7444013.818, 46132... | Brazil | Brasil | BR | Minas Gerais | BR31 |
b. Dissolve for groups¶
Using dissolve() with no arguments returns the union of the polygons, BUT also you get a GEOdf. However, if you have a column that represents a grouping (as we do), you can dissolve by that column:
# dissolving
brazil_municipalities.dissolve(by='ADM1_PT').plot(facecolor='yellow', edgecolor='black',linewidth=0.2)
<Axes: >
Again, let me save this result:
Brazil_adm1_diss=brazil_municipalities.dissolve(by='ADM1_PT')
We know we have a GeoDF; let's see contents:
Brazil_adm1_diss
| geometry | ADM0_EN | ADM0_PT | ADM0_PCODE | ADM1_PCODE | ADM2_PT | ADM2_PCODE | ET_ID | |
|---|---|---|---|---|---|---|---|---|
| ADM1_PT | ||||||||
| Acre | MULTIPOLYGON (((2060163.783 8784945.389, 20600... | Brazil | Brasil | BR | BR12 | Acrelândia | BR1200013 | 52 |
| Alagoas | POLYGON ((5672327.106 8882538.373, 5672062.284... | Brazil | Brasil | BR | BR27 | Água Branca | BR2700102 | 1650 |
| Amapá | MULTIPOLYGON (((4019970.847 9872122.620, 40190... | Brazil | Brasil | BR | BR16 | Serra do Navio | BR1600055 | 295 |
| Amazonas | MULTIPOLYGON (((1990420.701 9130001.855, 19900... | Brazil | Brasil | BR | BR13 | Alvarães | BR1300029 | 74 |
| Bahia | MULTIPOLYGON (((4662529.316 8340947.682, 46619... | Brazil | Brasil | BR | BR29 | Abaíra | BR2900108 | 1827 |
| Ceará | MULTIPOLYGON (((5251532.758 9270745.656, 52514... | Brazil | Brasil | BR | BR23 | Abaiara | BR2300101 | 891 |
| Distrito Federal | POLYGON ((4513061.426 8221825.541, 4512959.571... | Brazil | Brasil | BR | BR53 | Brasília | BR5300108 | 5571 |
| Espírito Santo | MULTIPOLYGON (((5230048.277 7601262.124, 52298... | Brazil | Brasil | BR | BR32 | Afonso Cláudio | BR3200102 | 3097 |
| Goiás | MULTIPOLYGON (((4095906.674 7824766.730, 40957... | Brazil | Brasil | BR | BR52 | Abadia de Goiás | BR5200050 | 5325 |
| Maranhão | MULTIPOLYGON (((4672225.852 9020212.253, 46724... | Brazil | Brasil | BR | BR21 | Açailândia | BR2100055 | 450 |
| Mato Grosso | MULTIPOLYGON (((3596657.302 8050261.396, 35962... | Brazil | Brasil | BR | BR51 | Acorizal | BR5100102 | 5184 |
| Mato Grosso do Sul | MULTIPOLYGON (((3578221.661 7460077.796, 35777... | Brazil | Brasil | BR | BR50 | Água Clara | BR5000203 | 5105 |
| Minas Gerais | MULTIPOLYGON (((4613768.235 7444013.818, 46132... | Brazil | Brasil | BR | BR31 | Abadia dos Dourados | BR3100104 | 2244 |
| Paraná | MULTIPOLYGON (((3838507.104 6986653.341, 38381... | Brazil | Brasil | BR | BR41 | Abatiá | BR4100103 | 3912 |
| Paraíba | MULTIPOLYGON (((5537757.114 9137226.114, 55371... | Brazil | Brasil | BR | BR25 | Água Branca | BR2500106 | 1242 |
| Pará | MULTIPOLYGON (((4029416.142 8917861.350, 39986... | Brazil | Brasil | BR | BR15 | Abaetetuba | BR1500107 | 151 |
| Pernambuco | MULTIPOLYGON (((5543087.493 8978786.229, 55426... | Brazil | Brasil | BR | BR26 | Abreu e Lima | BR2600054 | 1465 |
| Piauí | POLYGON ((4788787.991 8787535.671, 4788523.170... | Brazil | Brasil | BR | BR22 | Acauã | BR2200053 | 667 |
| Rio Grande do Norte | POLYGON ((5526933.390 9281317.002, 5526247.570... | Brazil | Brasil | BR | BR24 | Acari | BR2400109 | 1075 |
| Rio Grande do Sul | MULTIPOLYGON (((3696834.468 6334740.927, 36961... | Brazil | Brasil | BR | BR43 | Lagoa Mirim | BR4300001 | 4606 |
| Rio de Janeiro | MULTIPOLYGON (((4818916.513 7344536.670, 48187... | Brazil | Brasil | BR | BR33 | Angra dos Reis | BR3300100 | 3175 |
| Rondônia | MULTIPOLYGON (((2786460.123 8567095.355, 27860... | Brazil | Brasil | BR | BR11 | Alta Floresta D'Oeste | BR1100015 | 0 |
| Roraima | POLYGON ((3104062.415 10027970.921, 3103505.61... | Brazil | Brasil | BR | BR14 | Amajari | BR1400027 | 136 |
| Santa Catarina | MULTIPOLYGON (((3902797.582 6876520.458, 39024... | Brazil | Brasil | BR | BR42 | Abdon Batista | BR4200051 | 4311 |
| Sergipe | POLYGON ((5596445.601 8718324.023, 5596255.473... | Brazil | Brasil | BR | BR28 | Amparo do São Francisco | BR2800100 | 1752 |
| São Paulo | MULTIPOLYGON (((4110594.074 7411208.575, 41102... | Brazil | Brasil | BR | BR35 | Adamantina | BR3500105 | 3267 |
| Tocantins | POLYGON ((4304809.984 8561426.853, 4305040.854... | Brazil | Brasil | BR | BR17 | Abreulândia | BR1700251 | 311 |
Again, we can drop columns that do not bring important information:
Brazil_adm1_diss.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)
Brazil_adm1_diss.reset_index(inplace=True)
Brazil_adm1_diss.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 27 entries, 0 to 26 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ADM1_PT 27 non-null object 1 geometry 27 non-null geometry 2 ADM0_EN 27 non-null object 3 ADM0_PT 27 non-null object 4 ADM0_PCODE 27 non-null object 5 ADM1_PCODE 27 non-null object dtypes: geometry(1), object(5) memory usage: 1.4+ KB
Exercise 6¶
- Create some subset of polygons with your country data.
- Use Unary UNION with those polygons.
- Create a geoDF with the result.
- Use dissolve with the same polygons, and create a geoDF.
- Create some subset of polygons with your country data.
denmark_municipalities.plot(ax=ax2, facecolor='lightgrey', edgecolor='black',linewidth=0.2)
<Axes: >
<Figure size 640x480 with 0 Axes>
pd.set_option('display.max_columns', None)
denmark_municipalities.head(10)
| id | parent_id | name | placetype | placelocal | country | repo | lat | lon | min_lat | min_lon | max_lat | max_lon | min_zoom | max_zoom | min_label | max_label | modified | is_funky | population | country_id | region_id | county_id | concord_ke | concord_id | iso_code | hasc_id | gn_id | wd_id | name_ara | name_ben | name_deu | name_eng | name_ell | name_fas | name_fra | name_heb | name_hin | name_hun | name_ind | name_ita | name_jpn | name_kor | name_nld | name_pol | name_por | name_rus | name_spa | name_swe | name_tur | name_ukr | name_urd | name_vie | name_zho | geom_src | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1159297717 | 85682589 | Holbaek | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.675774 | 11.573848 | 55.508929 | 11.350962 | 55.809169 | 11.858369 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 71541 | 85633121 | 85682589 | None | dk-geodk:code | 316 | None | None | None | None | None | None | None | Holbaek | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | dk-geodk | MULTIPOLYGON (((1028605.296 6171633.116, 10286... |
| 1 | 1159297719 | 85682589 | Stevns | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.326944 | 12.334101 | 55.235697 | 12.107425 | 55.427477 | 12.456528 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 22805 | 85633121 | 85682589 | None | dk-geodk:code | 336 | None | None | None | None | None | None | None | Stevns | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | dk-geodk | MULTIPOLYGON (((1071296.650 6133778.538, 10712... |
| 2 | 1159297721 | 85682597 | Skanderborg | localadmin | kommune | DK | whosonfirst-data-admin-dk | 56.076149 | 9.872121 | 55.953987 | 9.633145 | 56.216936 | 10.092933 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 62678 | 85633121 | 85682597 | None | dk-geodk:code | 746 | None | None | None | None | None | None | Skanderborg | Skanderborg | None | None | Skanderborg | None | None | None | None | Skanderborg | None | None | Skanderborg | Skanderborg | Skanderborg | Сканнерборг | Skanderborg | Skanderborg | None | None | None | None | 斯坎訥堡自治市 | dk-geodk | POLYGON ((894560.919 6217046.948, 894711.815 6... |
| 3 | 1159297723 | 85682581 | Lyngby-Taarbaek | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.781735 | 12.510815 | 55.759558 | 12.413286 | 55.809710 | 12.596912 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 56214 | 85633121 | 85682581 | None | dk-geodk:code | 173 | None | None | None | None | None | None | None | Lyngby-Taarbaek | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | dk-geodk | POLYGON ((1078317.333 6187878.034, 1078327.597... |
| 4 | 1159297725 | 85682597 | Herning | localadmin | kommune | DK | whosonfirst-data-admin-dk | 56.203869 | 8.933449 | 55.879568 | 8.433759 | 56.414040 | 9.187847 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 89127 | 85633121 | 85682597 | None | dk-geodk:code | 657 | None | None | None | None | هرنينغ | None | Herning | Herning | None | هرنینگ | Herning | None | None | None | Munisipalitas Herning | Herning | ヘアニング | 헤르닝 | Herning | Herning | None | Хернинг | Herning | Herning | Herning | Гернінг | None | Herning | 海宁 | dk-geodk | MULTIPOLYGON (((832268.655 6226344.658, 832274... |
| 5 | 1159297727 | 85682581 | Copenhagen | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.698354 | 12.571489 | 55.612853 | 12.452998 | 55.732711 | 12.734255 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 632340 | 85633121 | 85682581 | None | dk-geodk:code | 101 | None | None | None | None | كوبنهاغن | কোপেনহেগেন | Kopenhagen | Copenhagen | Κοπεγχάγη | کپنهاگ | Copenhague | קופנהגן | कोपनहेगन | Koppenhága | Kopenhagen | Copenaghen | コペンハーゲン | 코펜하겐 | Kopenhagen | Kopenhaga | Copenhaga | Копенгаген | Copenhague | Köpenhamn | Kopenhag | Копенгаген | کوپن ہیگن | Copenhagen | 哥本哈根 | dk-geodk | MULTIPOLYGON (((1088066.280 6176427.948, 10880... |
| 6 | 1159297729 | 85682581 | Frederikssund | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.793708 | 11.985618 | 55.714237 | 11.845962 | 55.937969 | 12.231291 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 45223 | 85633121 | 85682581 | None | dk-geodk:code | 250 | None | None | None | None | None | None | Frederikssund | Frederikssund | None | None | None | None | None | Frederikssund | None | None | None | None | Frederikssund | Frederikssund | None | Фредерикссунн | Frederikssund | Frederikssund | None | None | None | None | None | dk-geodk | MULTIPOLYGON (((1043096.801 6177341.505, 10430... |
| 7 | 1159297733 | 85682581 | Albertslund | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.679988 | 12.347031 | 55.641959 | 12.311000 | 55.709789 | 12.396280 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 27731 | 85633121 | 85682581 | None | dk-geodk:code | 165 | None | None | None | None | None | None | None | Albertslund | None | None | None | None | None | None | None | None | None | None | None | Albertslund | None | None | Albertslund | Albertslund | None | None | None | None | None | dk-geodk | POLYGON ((1065542.084 6171219.759, 1065519.958... |
| 8 | 1159297735 | 85682597 | Samso | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.818602 | 10.581390 | 55.764339 | 10.520158 | 56.002364 | 10.793492 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 3657 | 85633121 | 85682597 | None | dk-geodk:code | 741 | None | None | None | None | None | None | None | Samso | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | dk-geodk | MULTIPOLYGON (((956826.597 6195325.368, 956823... |
| 9 | 1159297737 | 85682589 | Guldborgsund | localadmin | kommune | DK | whosonfirst-data-admin-dk | 54.814822 | 11.950360 | 54.559078 | 11.530032 | 54.971843 | 12.165719 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 60722 | 85633121 | 85682589 | None | dk-geodk:code | 376 | None | None | None | None | None | None | Guldborgsund | Guldborgsund | None | None | Guldborgsund | None | None | None | None | None | グルドボースンド海峡 | None | Guldborg Sund | None | None | None | Guldborgsund | Guldborg Sund | None | None | None | None | 古爾德堡海峽 | dk-geodk | MULTIPOLYGON (((1027998.825 6079346.001, 10280... |
denmark_municipalities['parent_id'].value_counts() # a qué región pertenece cada localidad
parent_id 85682581 29 85682575 22 85682597 19 85682589 17 85682593 11 -1 1 Name: count, dtype: int64
denmark_municipalities[denmark_municipalities['parent_id']==85682581].plot()
- Use Unary UNION with those polygons.
parent_id_union=denmark_municipalities[denmark_municipalities['parent_id']==85682581].unary_union
parent_id_union
- Create a geoDF with the result.
parentGDF = gpd.GeoDataFrame(index=[0],data={'parent_id':85682581},
crs=denmark_municipalities.crs,
geometry=[parent_id_union])
parentGDF
| parent_id | geometry | |
|---|---|---|
| 0 | 85682581 | MULTIPOLYGON (((1042974.632 6177385.131, 10429... |
- Use dissolve with the same polygons, and create a geoDF.
denmark_municipalities.dissolve(by='parent_id').plot(facecolor='yellow', edgecolor='black',linewidth=0.2)
denmark_parentid_diss=denmark_municipalities.dissolve(by='parent_id')
denmark_parentid_diss #este es nuestro geo dataframe
| geometry | id | name | placetype | placelocal | country | repo | lat | lon | min_lat | min_lon | max_lat | max_lon | min_zoom | max_zoom | min_label | max_label | modified | is_funky | population | country_id | region_id | county_id | concord_ke | concord_id | iso_code | hasc_id | gn_id | wd_id | name_ara | name_ben | name_deu | name_eng | name_ell | name_fas | name_fra | name_heb | name_hin | name_hun | name_ind | name_ita | name_jpn | name_kor | name_nld | name_pol | name_por | name_rus | name_spa | name_swe | name_tur | name_ukr | name_urd | name_vie | name_zho | geom_src | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| parent_id | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -1 | MULTIPOLYGON (((1244246.738 6139688.235, 12442... | 1394014155 | Christianso | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.320402 | 15.188798 | 55.317257 | 15.173827 | 55.330772 | 15.197400 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 84 | -1 | -1 | None | dk-geodk:code | 411 | None | None | None | None | None | None | None | Christianso | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | dk-geodk |
| 85682575 | MULTIPOLYGON (((882439.770 6083032.984, 882441... | 1159297747 | Haderslev | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.232202 | 9.374950 | 55.125549 | 8.884642 | 55.382343 | 9.778885 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 55670 | 85633121 | 85682575 | None | dk-geodk:code | 510 | None | None | None | None | أودنسه | সভেন্ডবোর্গ | Haderslev | Haderslev | Λάνγκελαντ | ادنسه | Kerteminde | אודנסה | ओडिन्से | Vejen | Langeland | Haderslev | ランゲラン島 | 하데르슬레우 | Haderslev | Haderslev | Haderslev | Хадерслев | Haderslev | Haderslev | Haderslev | Гадерслев | اودنسے | Langeland | 凯特明讷 | dk-geodk |
| 85682581 | MULTIPOLYGON (((1042974.632 6177385.131, 10429... | 1159297723 | Lyngby-Taarbaek | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.781735 | 12.510815 | 55.759558 | 12.413286 | 55.809710 | 12.596912 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 56214 | 85633121 | 85682581 | None | dk-geodk:code | 173 | None | None | None | None | كوبنهاغن | কোপেনহেগেন | Kopenhagen | Lyngby-Taarbaek | Κοπεγχάγη | کپنهاگ | Copenhague | קופנהגן | कोपनहेगन | Koppenhága | Kopenhagen | Copenaghen | コペンハーゲン | 코펜하겐 | Kopenhagen | Kopenhaga | Copenhaga | Копенгаген | Copenhague | Köpenhamn | Kopenhag | Копенгаген | کوپن ہیگن | Copenhagen | 哥本哈根 | dk-geodk |
| 85682589 | MULTIPOLYGON (((976998.632 6074588.187, 977002... | 1159297717 | Holbaek | localadmin | kommune | DK | whosonfirst-data-admin-dk | 55.675774 | 11.573848 | 55.508929 | 11.350962 | 55.809169 | 11.858369 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 71541 | 85633121 | 85682589 | None | dk-geodk:code | 316 | None | None | None | None | لولاند | রসক্লাইড | Guldborgsund | Holbaek | Λόλλαντ | لولند | Guldborgsund | ליירה | रोसकिल्ड | Slagelse | Lolland | Slagelse | グルドボースンド海峡 | 슬라겔세 | Guldborg Sund | Slagelse | Slagelse | Слагельсе | Guldborgsund | Guldborg Sund | Roskilde | Слагельсе | راسکیلے | Slagelse | 古爾德堡海峽 | dk-geodk |
| 85682593 | MULTIPOLYGON (((834510.776 6287190.389, 834508... | 1159297741 | Vesthimmerland | localadmin | kommune | DK | whosonfirst-data-admin-dk | 56.813064 | 9.364557 | 56.636899 | 9.068354 | 57.030598 | 9.659873 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 36727 | 85633121 | 85682593 | None | dk-geodk:code | 820 | None | None | None | None | آلبورغ | ফ্রেডেরিকশাভন | Aalborg | Vesthimmerland | Άλμποργκ | آلبورگ | Aalborg | אולבורג | फ़्रेडेरिकशॉन | Aalborg | Aalborg | Aalborg | オールボー | 올보르 | Aalborg | Aalborg | Aalborg | Ольборг | Aalborg | Ålborg | Aalborg | Ольборг | آلبو | Aalborg | 奥尔堡 | dk-geodk |
| 85682597 | MULTIPOLYGON (((802338.295 6194475.607, 802338... | 1159297721 | Skanderborg | localadmin | kommune | DK | whosonfirst-data-admin-dk | 56.076149 | 9.872121 | 55.953987 | 9.633145 | 56.216936 | 10.092933 | 12 | None | 13.5 | 18 | 2023-09-28 | None | 62678 | 85633121 | 85682597 | None | dk-geodk:code | 746 | None | None | None | None | هرنينغ | ভিবর্গ | Skanderborg | Skanderborg | Ράντερς | هرنینگ | Skanderborg | אורהוס | वाइबोर्ग | Randers | Munisipalitas Herning | Skanderborg | ヘアニング | 헤르닝 | Skanderborg | Skanderborg | Skanderborg | Сканнерборг | Skanderborg | Skanderborg | Herning | Гернінг | آرہس | Herning | 斯坎訥堡自治市 | dk-geodk |
c. Dissolve and aggregate¶
In pandas, you have can aggregate data using some statistics. Dissolve can be used in the same way. Let me use the indicators layer:
indicators.head()
We have numerical columns, and a grouping column named region. Let's get some averages by region, while combining the polygons:
indicators.dissolve(
by="region",
aggfunc={
"COUNTRY": "count",
"fragility": ["mean"],
"co2": ["mean"],
"ForestRev_gdp": ["mean"]
},as_index=False,
)
Let me save and plot:
indicatorsByRegion=indicators.dissolve(
by="region",
aggfunc={
"COUNTRY": "count",
"fragility": ["mean"],
"co2": ["mean"],
"ForestRev_gdp": ["mean"]
},as_index=False,
)
indicatorsByRegion.plot(column = 'region')
Without renaming, you can request a choropleth:
indicatorsByRegion.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd',
legend=True,
legend_kwds={"title": "Fragility",'loc': 'lower left'},
edgecolor='black',linewidth=0.2,
figsize=(15, 10))
brazil_5641.centroid
# then
brazil_5641.centroid.x.values[0],brazil_5641.centroid.y.values[0]
airports_5641
# coordinates
centroidX,centroidY=brazil_5641.centroid.x.values[0],brazil_5641.centroid.y.values[0]
# subsets of medium airports
Brazil_AirTopLeft=airports_5641[airports_5641.kind=='medium_airport'].cx[:centroidX,centroidY:]
Brazil_AirTopRight=airports_5641[airports_5641.kind=='medium_airport'].cx[centroidX:,centroidY:]
Brazil_AirBottomLeft=airports_5641[airports_5641.kind=='medium_airport'].cx[:centroidX,:centroidY]
Brazil_AirBottomRight=airports_5641[airports_5641.kind=='medium_airport'].cx[centroidX:,:centroidY]
Let me plot the subsets:
base=Brazil_AirTopLeft.plot(facecolor='grey', alpha=0.4)
Brazil_AirTopRight.plot(ax=base,facecolor='orange', alpha=0.4)
Brazil_AirBottomLeft.plot(ax=base,facecolor='green', alpha=0.4)
Brazil_AirBottomRight.plot(ax=base,facecolor='red', alpha=0.4)
Notice we have simple points in each subset:
Brazil_AirBottomLeft
In this situation, you can not make a convex hull:
Brazil_AirBottomLeft.convex_hull.plot()
You first need to dissolve, and then you create a hull, an envelope of convex angles:
Brazil_AirBottomLeft.dissolve().convex_hull.plot()
As we saw, the convex hull is a polygon:
Brazil_AirBottomLeft.dissolve().convex_hull
What if we the polygons had not been previously combined?
brazil_municipalities.cx[:centroidX,:centroidY].convex_hull.plot(edgecolor='red')
That is, you get a convex hull for each geometry.
We can also use union before creating a convex hull:
# just the union
large_airport=airports_5641[airports_5641.kind=='large_airport']
large_airport.unary_union
# hull of the union
large_airport.unary_union.convex_hull
Let's turn the GS into a GDF:
LargeAirport_hull= gpd.GeoDataFrame(index=[0],
crs=large_airport.crs,
geometry=[large_airport.unary_union.convex_hull])
LargeAirport_hull['name']='large airports hull' # optional
# then
LargeAirport_hull
Let's use the GDF in plotting:
base=brazil_5641.plot(facecolor='yellow')
large_airport.plot(ax=base)
LargeAirport_hull.plot(ax=base,facecolor='green',
edgecolor='white',alpha=0.4,
hatch='X')
Exercise 7¶
Select some points from your maps.
Create the convex hull for those points.
Turn the hull into a GDF.
Plot the hull on top of the country.
# the north
MunisN_brazil=brazil_municipalities.cx[:,centroidY:]
# the south
MunisS_brazil=brazil_municipalities.cx[:,:centroidY]
# the west
MunisW_brazil=brazil_municipalities.cx[:centroidX,:]
# the east
MunisE_brazil=brazil_municipalities.cx[centroidX:,:]
Let me plot:
base=MunisN_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisS_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)
base=MunisE_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisW_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)
Intersection¶
We keep what is common between GeoDFs:
munisNS_brazil=MunisN_brazil.overlay(MunisS_brazil, how="intersection",keep_geom_type=True)
munisNS_brazil.plot()
This is similar to a spatial join:
MunisN_brazil.sjoin(MunisS_brazil, how="inner", predicate='contains').plot()
# keeping the overlay
munisWE_brazil=MunisW_brazil.overlay(MunisE_brazil, how="intersection",keep_geom_type=True)
munisWE_brazil.plot(edgecolor='white',linewidth=0.1)
Union¶
Let me unite the previous two GeoDFs. First, take a look at each one:
munisNS_brazil.info()
munisWE_brazil.info()
The overlay combines the geometries, but not the attributes. Let me subset and show you:
keep=['ADM0_EN_1','ADM1_PT_1','ADM2_PT_1','geometry']
munisNS_brazil=munisNS_brazil.loc[:,keep]
munisWE_brazil=munisWE_brazil.loc[:,keep]
# now
munisNS_brazil.overlay(munisWE_brazil,how="union",keep_geom_type=True)
As you see, geometries are fine, but not attributes. It is strictly NOT appending the GeoDFs:
# appending
pd.concat([munisNS_brazil,munisWE_brazil],ignore_index=True)
You will append if you are interested in the keeping the attributes. But you just do the overlay if you are planing to combine final results:
munisNS_brazil.dissolve().overlay(munisWE_brazil.dissolve(), how="union",keep_geom_type=True).dissolve().plot()
Let me create an object to save the previous result:
muniMidBrazil=munisNS_brazil.dissolve().overlay(munisWE_brazil.dissolve(), how="union",keep_geom_type=True).dissolve()
muniMidBrazil
# some cleaning
muniMidBrazil['zone']='middles'
muniMidBrazil=muniMidBrazil.loc[:,['ADM0_EN_1_1','zone','geometry']]
muniMidBrazil
Difference¶
Here, you keep what belongs to the GeoDF to left that is not in the GeoDF to the right:
# with the municipalities
brazil_municipalities.overlay(muniMidBrazil, how='difference').plot()
Symmetric Difference¶
This is the opposite to intersection, you keep what is not in the intersection:
MunisN_brazil.overlay(MunisS_brazil, how="symmetric_difference",keep_geom_type=False).plot()
MunisW_brazil.overlay(MunisE_brazil, how="symmetric_difference",keep_geom_type=False).plot()
Exercise 8¶
- Apply two of these operations to your maps.
- Apply two of these operations to the next maps:
# hulls for the mid size airports:
Brazil_AirTopLeft_hull=Brazil_AirTopLeft.dissolve().convex_hull
Brazil_AirTopRight_hull=Brazil_AirTopRight.dissolve().convex_hull
Brazil_AirBottomLeft_hull=Brazil_AirBottomLeft.dissolve().convex_hull
Brazil_AirBottomRight_hull=Brazil_AirBottomRight.dissolve().convex_hull
base = brazil_5641.plot(color='white', edgecolor='black') #unprojected
muniMidBrazil.plot(ax=base,facecolor='magenta',alpha=0.4) #unprojected
LargeAirport_hull.plot(ax=base,facecolor='purple',alpha=0.4)
Brazil_AirTopLeft_hull.plot(ax=base,facecolor='grey', alpha=0.4)
Brazil_AirTopRight_hull.plot(ax=base,facecolor='orange', alpha=0.4)
Brazil_AirBottomLeft_hull.plot(ax=base,facecolor='green', alpha=0.4)
Brazil_AirBottomRight_hull.plot(ax=base,facecolor='red', alpha=0.4)
# non valid
brazil_municipalities[~brazil_municipalities.is_valid]
# see the invalid:
brazil_municipalities[~brazil_municipalities.is_valid].plot()
It is difficult to see what is wrong. Let's get some information:
# what is wrong?
from shapely.validation import explain_validity, make_valid
explain_validity(brazil_municipalities[~brazil_municipalities.is_valid].geometry)
# varieties?
brazil_municipalities['validity']=[x.split('[')[0] for x in brazil_municipalities.geometry.apply(lambda x: explain_validity(x))]
brazil_municipalities['validity'].value_counts()
# solving the issue:
brazil_municipalities.drop(columns=['validity'],inplace=True)
brazil_municipalities_valid=brazil_municipalities.copy()
brazil_municipalities_valid['geometry'] = [make_valid(row) if not row.is_valid else row for row in brazil_municipalities_valid['geometry'] ]
#any invalid?
brazil_municipalities_valid[~brazil_municipalities_valid.is_valid]
The solution we got may help for some advanced techniques, but may also give us some extra trouble. Notice that once geopandas solved the problem, you have created collections:
pd.Series([type(x) for x in brazil_municipalities_valid.geometry]).value_counts()
However, the pressence of collections may be troublesome in shapefiles.